Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicitly precompile register_llvm_rules #1303

Merged
merged 1 commit into from
Feb 21, 2024
Merged

Explicitly precompile register_llvm_rules #1303

merged 1 commit into from
Feb 21, 2024

Conversation

vchuravy
Copy link
Member

@vchuravy vchuravy commented Feb 21, 2024

Reduces loading time on my system significantly from ~10s to 0.3s

@vchuravy
Copy link
Member Author

Before:

julia> @time_imports import Enzyme
      2.8 ms  EnzymeCore
      3.2 ms  CEnum
     22.0 ms  Preferences
      0.2 ms  LazyArtifacts
      0.3 ms  JLLWrappers
               ┌ 8.4 ms LLVMExtra_jll.__init__() 52.03% compilation time
     42.5 ms  LLVMExtra_jll 88.14% compilation time
               ┌ 0.1 ms LLVM.__init__() 
    114.3 ms  LLVM
               ┌ 3.9 ms Enzyme_jll.__init__() 
      4.7 ms  Enzyme_jll
      0.3 ms  ExprTools
               ┌ 0.0 ms TimerOutputs.__init__() 
     50.7 ms  TimerOutputs
      0.3 ms  Scratch
               ┌ 0.3 ms GPUCompiler.__init__() 
    525.8 ms  GPUCompiler 0.01% compilation time
      0.2 ms  Reexport
      0.3 ms  StructIO
               ┌ 0.0 ms ObjectFile.__init__() 
     22.5 ms  ObjectFile
               ┌ 0.6 ms Enzyme.API.__init__() 
               ├ 0.2 ms Enzyme.Compiler.JIT.__init__() 
               ├ 0.0 ms Enzyme.Compiler.FFI.BLASSupport.__init__() 
               ├ 1.7 ms Enzyme.Compiler.FFI.__init__() 
               ├ 10703.1 ms Enzyme.Compiler.__init__() 100.00% compilation time
  11213.6 ms  Enzyme 95.45% compilation time

After:

julia> @time_imports import Enzyme
      2.8 ms  EnzymeCore
      3.2 ms  CEnum
     22.2 ms  Preferences
      0.2 ms  LazyArtifacts
      0.3 ms  JLLWrappers
               ┌ 8.1 ms LLVMExtra_jll.__init__() 52.60% compilation time
     40.8 ms  LLVMExtra_jll 88.25% compilation time
               ┌ 0.1 ms LLVM.__init__() 
    116.6 ms  LLVM
               ┌ 3.9 ms Enzyme_jll.__init__() 
      4.6 ms  Enzyme_jll
      0.3 ms  ExprTools
               ┌ 0.0 ms TimerOutputs.__init__() 
     51.3 ms  TimerOutputs
      0.3 ms  Scratch
               ┌ 0.3 ms GPUCompiler.__init__() 
    344.5 ms  GPUCompiler 0.02% compilation time
      0.2 ms  Reexport
      0.3 ms  StructIO
               ┌ 0.0 ms ObjectFile.__init__() 
     23.0 ms  ObjectFile
               ┌ 0.6 ms Enzyme.API.__init__() 
               ├ 0.2 ms Enzyme.Compiler.JIT.__init__() 
               ├ 0.0 ms Enzyme.Compiler.FFI.BLASSupport.__init__() 
               ├ 1.7 ms Enzyme.Compiler.FFI.__init__() 
               ├ 0.0 ms Enzyme.Compiler.__init__() 
    301.6 ms  Enzyme 0.03% compilation time

@vchuravy
Copy link
Member Author

Fixes #776

@vchuravy vchuravy merged commit cfb15fc into main Feb 21, 2024
41 of 45 checks passed
@vchuravy vchuravy deleted the vc/fix_loading branch February 21, 2024 20:40
@KristofferC
Copy link

#776 is mostly about the time it takes if anything in here is invalidated. When that issue was opened the package was just as quick to load on its own as shown here, it was in combination with other packages it got slow.

@vchuravy
Copy link
Member Author

Is that still the case? I could additionally break the call edge by using invokelatest.

@KristofferC
Copy link

julia> @time using Enzyme
  0.231514 seconds (259.79 k allocations: 20.705 MiB, 3.36% gc time, 0.88% compilation time)

######

julia> struct T end

julia> Base.show(io::IO, t::Type{T}) = print(io, "heh")

julia> @time using Enzyme
 10.714881 seconds (26.53 M allocations: 1.836 GiB, 5.36% gc time, 97.91% compilation time: 26% of which was recompilation)

@KristofferC
Copy link

To me, something like KristofferC@4246ddb#diff-21145f7299f4174cfb41f7c0c845a0ee798fc065bae0a352171552dc4b810508R1033 feels correct it just has to be ensured that the actual callbacks themselves that end up getting run during runtime gets properly optimized (but not the code that generates and inserts them).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants